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Abstract 

Implicit particle filters for data assimilation generate high-probability 
samples by representing each particle location as a separate function of a 
common reference variable. This representation requires that a certain un- 
derdetermined equation be solved for each particle and at each time an 
observation becomes available. We present a new implementation of im- 
plicit filters in which we find the solution of the equation via a random map. 
As examples, we assimilate data for a stochastically driven Lorenz system 
with sparse observations and for a stochastic Kuramoto-Sivashinski equation 
with observations that are sparse in both space and time. 
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1 Introduction 

In many applications in science and engineering the state of a system must 
be identified from an uncertain model supplemented by a stream of noisy 
data. Problems of this kind are typically formulated in terms of an ltd 
stochastic differential equation (SDE) 

dx(t) = f(x, t)dt + g(x, t)dW, (1) 

where t > is the independent variable, the state x(t) is a real m-dimensional 
column vector, /(x, t) is a real m-dimensional vector function g(x, t) is a real 
m x m matrix and dW is an m-dimensional Brownian motion (BM). The 
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probability density function (pdf) of the state at t = is known. As the 
solution of the SDE evolves, measurements 

b n = h(x n ,t n )+QV n (2) 

are recorded at times t n ,n = 1,2,3,..., where h is a A:-dimensional vector 
function (k < m), Q is a real k x k matrix, and V n is a /c-dimensional vector, 
whose components are k independent standard normal variates. We assume 
for the remainder of this paper that V n is independent of the BM in ([I]). 
The goal is to use both the model (pQ) and the observations ([2]) to determine 
the state of the system. 

The best estimate of the state x n = x{t n ) is, under wide conditions, the 
mean of the probability density defined by the SDE and conditioned on the 
data. In practice, the SDE must be discretized, as described, for example, 
in [SJ, so that we are dealing with a discrete recursion conditioned by dis- 
crete data. If the model flTJ) as well as the observations © are linear and if, 
in addition, the initial data are Gaussian, the conditional expectation can 
be computed via the Kalman-Bucy filter [22J. Strategies for tackling non- 
linear, non-Gaussian problems include the ensemble Kalman filter [11] . the 
extended Kalman filter [12 ,19], the unscented Kalman filter [21j . and varia- 
tional methods [33ti35]. These data assimilation strategies require Gaussian 
approximations or a linearization of the model and sometimes yield rather 
poor results if the nonlinearity is strong. We refer to [Hl[l6j[l7l[29j[39] for a 
review of various data assimilation algorithms, their applications and limi- 
tations. 

Particle filters [HISHQIIlQSllinilll] are sequential Monte Carlo tools that 
do not rely upon Gaussianity or linearity assumptions. In particle filters 
one works with a collection of "particles" (replicas of the system), whose 
empirical distribution approximates the conditional pdf at the n-th step. 
One moves all particles forward in time using some guess of the pdf at the 
next step, and then one corrects the guess by weighting the particles. The 
procedure is repeated at the next time an observation becomes available. 
The catch is that it is difficult to guess the next density accurately before 
doing the calculations; with most weighting schemes, many of the weights 
are therefore very small so that most of the computational effort is wasted 
on unlikely particles. As a consequence, the number of particles required 
can grow catastrophically with the dimension of the SDE [2|I38|. 

The implicit filter [5j|6] is designed to remedy this problem, i.e. it at- 
tempts to make nonlinear data assimilation feasible in high dimensional 
SDEs. The basic idea is to reverse the standard procedure. Rather than 
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generating a sample and then computing its probability, the implicit fil- 
ter finds regions of high probability taking the observations into account, 
and then looks for samples in these regions; this generates a thin beam of 
high probability particles, focussed on the observations, making the number 
of required particles manageable. The focusing is done by connecting the 
samples to a fixed reference density through a map that satisfies a data- 
dependet algebraic equation. This map is not unique and the efficiency of 
the sampling depends on the map one chooses. In the present paper we 
present an efficient implementation in which the reference variables are con- 
nected to the samples by a random map. We demonstrate its effectiveness 
on two test problems: a stochastically driven Lorenz model with sparse data 
and a stochastic Kuramoto-Sivashinski equation with data sparse in both 
space and time. We compare the implicit filter to a Sampling-Importance- 
Resampling (SIR) filter (see [HEME]) which constructs a prior density using 
the SDE and only later re-weights the particle positions by the observations. 

A different algorithm for guiding particles towards high probability re- 
gions has been presented in [10] . If the observations are linear and available 
at every time step, our filter reduces to a variant of the optimal SIR fil- 
ter [U0E2], as described below. 

2 Implicit sampling: basic ideas 

We start by reviewing the general framework of implicit sampling (see [5j[6j ) , 
explicitly allowing for the possibility that the observations are sparse in time, 
i.e., not necessarily available at every time step. We assume that the SDE ([I]) 
has been approximated by a difference scheme with time step S, in the form: 

x n+1 = R(x n , t n ) + G(x n , t n )AW n+ \ (3) 

where the functions R and G depend on the scheme we use, AW n+1 ~ 
iV(0, 5) and t n is shorthand for n5. For more details on the discretization, 
see [24] and the examples below. For simplicity, we assume a constant time 
step 5. The generalization of implicit sampling to higher order integration 
schemes is straightforward, see [5] and below. 

Assume we are given a collection of M particles with positions X?, j = 
1, . . . , M, whose empirical density approximates the conditional pdf at time 
t n , and suppose the next observation is available after r time steps, r a 
positive integer. Bayes' theorem can be used to show that the pdf of the 
jth particle at times t°, . . . , t n , t n+1 , . . . , t n+r , conditioned on the available 
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observations b n+r , is 

p(Xf-' n | ft 1 '-'") 

xP(IJ +1 | A?) . . . P(X] l + r - 1 | X" +r ~ 2 ) 
xPpf" +r | Xj l+r - 1 )P(6" +r | XV**) /Z" (4) 

where Z n is a normalization constant independent of the particles and X- 
is an abbreviation for X®, Xj, . . . , X™. Implicit sampling is a recipe for 
obtaining high probability samples from ([3]). 

For ease of notation we introduce the shorthand notation Xj for the r • m 
dimensional column vector [(X™ +1 ) T , . . . , (X™ +r ) T ] T (the state trajectory of 
the particle) and define, for each particle, a function Fj(Xj) by 

expi-FjiXj)) = P(X™ +1 | X?) . . . P(X] +r - 1 | x; +f '- 2 ) 

xP(X" +r | X" +r - 1 )P(5 n+r | X" +r ). (5) 

To obtain a sample we solve the algebraic equation 

Fi<Xj)-*i = \$& ( 6 ) 

where £j is a realization of a random variable £, drawn from a given, fixed 
reference density, say a r • m dimensional, multivariate normal distribution 
P^ = M(0,I). The additive, deterministic factor (f)j is needed to make the 
equation Fj(Xj) = (,j£,j/2 solvable (the left-hand-side is real, but the right- 
hand-side is non- negative) . The choice 

4>j = minPj, (7) 

where minPj is the global minimum of Fj, will do the job, and this is 
the choice we make in the present paper. We solve ([6]) for each particle 
because the functions Fj vary from particle to particle due to different pa- 
rameters X^. 

The variable £ on the right hand side of ([6]) is known and easy to sample, 
and by definition most of its samples will be high-probability samples near 
the origin. By equations ([6]) and (J7]) the corresponding values of Fj(Xj) 
will be near the minimum of Fj and therefore will have a high probability, 
so that with high probability we will have high probability samples. The 
probability density of the samples Xj corresponds to the "prior density" in 
the usual Bayesian sampling, but it is not a prior density in the usual sense, 
because the new positions of the particles are obtained by solving different 
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equations, rather than by sampling a common prior. The prior here is a 
parametrized family of functions of the reference variable. 

The empirical density defined by the new particle positions differs from 
the target density so that each sample must be weighted by the ratio of 
its probability with respect to the target density to its proposal probability 
[HEHllin]. Using P n+1 {Xj) and Pxj{Xj) as a shorthand notation for the 
target density (JI|) and the density defined by (J6|) respectively, we can obtain 
the weight w™ +1 of the particle X; at time t n+1 , i.e. its probability with 
respect to the target density: 
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n exp(-0.5fffr -</>,-) 

(X W'a ; — 7r — ; J, 

i exp(-0.5£j&) 

oc wj exp(—(f)j) J, (8) 



where J = |det dXj /d£\ is the Jacobian of the map. Having obtained the 
weights of all the particles, we normalize the weights so that their sum equals 
one. The variability of the weights modifies the reference density. The 
weights can be eliminated by resampling. Various resampling strategies and 
algorithms are discussed in pQ. Upon resampling, all particles have equal 
weights, so that, in particular, it is legitimate to omit the factor in 
equation pj). 

For future reference, we rewrite the function Fj in a slightly different 
form: 

FjiXj) = -(x;* 1 - f(x?)f "(G(Ay)G(A?) r )- 1 - f(jq: 

+ x - (x;+ 2 - /(x; +1 )) T (g(x; +1 )g(x; +1 ) t ) _1 (x r ; +2 - f(x? 



+ i [x] +r - fiX^-^Y (G(X J n+r - 1 )G(Xj +r - 1 ) T ) 1 (x™ +r - f{X] +r ~ l ) 

1 T 



+ -(h(X^)-b n+r )) (QQ 1 ) (h(X? +r ) - b n+r )J + Zj, (9) 

where Zj is a positive constant that can be computed from the normalization 
constants of the pdf's in the definition of Fj in ([5]). This constant need not 
be computed because it drops out in §§§ when 4>j = minFj. 
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Our construction can be readily generalized to SDE integration schemes 
with intermediate random steps. For example, suppose one is integrating 
the SDE (P) with additive noise, i.e. g(x n ,t n ) = g = constant, using the 
Klauder- Petersen scheme 1231: 



x n+i,* = xn + Sf^ + gy/SAWt, (10) 

x n+1 = x n + ^{f(x n ) + f(x n+1 '*))+9V6^W 2 , (11) 

where AVl^i, AW2 are m dimensional Gaussians with mean zero and variance 
/. For simplicity, assume that observations b n = h{x n ) + 773, with 773 ~ 
A^(0, s), are available at every step. Dropping the index of the particles, the 
probability of the pair (X n+1 '*, X n+1 ) is proportional to exp(— F), with 

1 1 X n+1,* _ X n _ Sf(X n ) || 2 || X n+1 - X n - § (f(X n ) + f(X n+1 '*) 



26g 2 25g 2 
+ ^)-»W\ z, (12, 

where the norm ||x|| = V x T x is the Euclidean norm. All one has to do then 
is solve ([6]) for the pair (X n+1 '*, X n+1 ), with a sample £j, drawn from a 2m 
dimensional Gaussian reference density, on the right-hand-side. 

The effectiveness of the filter rests on one's ability to solve the basic 
equation ([6]) efficiently This equation is underdetermined - it is a single 
equation connecting the m ■ r components of Xj to the reference variable £. 
Each solution algorithm defines a map from £ to Xj, and one has a great 
deal of freedom in choosing this map. Effective algorithms take advantage 
of this freedom. The conditions that the map must satisfy were derived and 
explained in [5]: the map should be (i) one-to-one and onto with probability 
one (so that the whole sample space is covered); (ii) smooth near the high- 
probability region of £ (so that the weights do not vary unduly from particle 
to particle); (iii) it should map the neighborhood of zero onto a neighborhood 
of the minimum of F, and (iv) there should be an easy way to evaluate the 
Jacobian \detdX/d£\. In our experience, condition (iv) is often the most 
onerous to satisfy in nonlinear problems. 

3 Solution of the implicit sampling equation via a 
random map 

A solution algorithm for equation ([6]) defines a map from £ to the sample Xj . 
This map is not unique and should satisfy conditions (i-iv) above. Various 
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ways to solve ([6]) have been presented in [5j[6] . In this section, we present a 
map that is random. 

First, we need to find the additive factor <pj = mm Fj in ©. We propose 
to find mhi-Fj using standard tools, e. g. Newton's method. It is also im- 
portant to note that the Hessian of Fj typically has a sparse block structure. 
This sparsity depends on the integration scheme we use for the discretiza- 
tion of the underlying SDE and can be exploited in the implementation of 
the algorithm. In the examples in sections H] and we present strategies 
for obtaining a "good" initialization for the minimization and then use a 
few straightforward Newton steps to polish the initial guess. We had no 
problems with this approach in the examples we considered. However, a 
quasi-Newton method can be used if the Hessian of Fj is out of reach. More 
sophisticated minimization strategies, e.g. a trust-region method, may be 
preferable in other applications. 

In the present paper we assume that Fj is convex and the Hessian evalu- 
ated at the global minimum <pj is nonsingular. In [5] we discussed what to do 
when this is not the case, in particular we presented strategies for replacing 
F by a convex function without bias or loss. The methods presented here 
are compatible with the constructions in [5]. 

To find a sample Xj, we solve ([6]) via the random Ansatz: 

Xj = fij + XjLjrjj, (13) 



where rjj = S,j is a sample of the Gaussian reference variable 

£ ~ M(0, 1) and fj,j is the location of the minimum of Fj, i.e. fij = argmin Fj. 
The invertible rm x rm matrix Lj is deterministic, under our control, and 
remains to be chosen (see below). By substitution of (fT3|) into ([6]) we obtain 
a single algebraic equation in a single variable Xj. The equation can be 
readily solved and its solution defines the sample Xj. A data assimilation 
problem of arbitrary dimension thus boils down to a minimization of a known 
function followed by the solution of an algebraic equation in one variable. 
When Aj(£) 7^ is continuous, the map () 1 3[) is one-to-one and onto almost 
surely so that requirement (i) is satisfied. 

The process of finding a sample via the random map f)13|) can be in- 
terpreted geometrically. Assuming that the level sets of Fj are closed, the 
algebraic equation (|6|) has a solution in every direction. We generate a ran- 
dom direction by sampling the reference density and computing 77, which is 
uniformly distributed on the unit sphere. We determine how far we need to 
walk along the random direction 77 to hit the level set F(Xj) = <pj +0.5£j£j 
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by solving © with the map (|13p . The matrix Lj is used to incorporate prior 
information. The geometry of the map is illustrated in figure [TJ 




Figure 1: Geometry of the random map Xj — > £. 



What remains to be done is compute the Jacobian of the map. The 
easiest way to do this calculation is as follows. We first rewrite (|13p as 



x j - t'.i + ^j L j £?> 



with Xj = From (fT41) . we compute the derivatives 



(14) 



(15) 



where we droped the index j for the particles for convenience and where 
d\/d^ denotes the gradient (a row vector) of the scalar A with respect to 
the reference variable £ (a column vector). By the chain rule, we obtain 



<)X dXdp 2 £ T 
<9£ dp <9£ dp 



where p = £ £, and substitute the result into ([15]) to get 



(16) 



(17) 
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Using standard rules for determinants such as det (A + BC) = det A-det(I + 
CA~ 1 B), we calculate the Jacobian: 



J = I det LI 



Substitution of A for A finally gives 
J = 2|detL| 



dp 



8\ 

\rm-l 

(9/9 



(18) 



(19) 



A formula for the scalar derivative dX/dp can be obtained by implicit dif- 
ferentiation of (El) combined with (1131) 



1 



dp, 2(VF,)L]V 



(20) 



where V.F,- denotes the gradient of Fj (an rm-dimensional row vector) . Al- 
ternatively, d\j/dpj can be computed numerically by putting A = A + A A, 
computing a new Xj + AXj using Eq. (113f) . followed by evaluation of the 
left hand side of Eq. © to get p + Ap, and differencing. The Jacobian (fl~9j) 
can thus be evaluated readily and condition (iv) in section [2] is satisfied. 
From (|8|), we compute the weights attached to each particle: 



oc vf- exp(- 



b j)\ det Lj\ p- 



l—rm/2 



_ x dXj_ 



(21) 



We now need to choose the matrix Lj . In the examples we considered (see 
sections 4 and 5), the filters performed poorly with the naive choice Lj = I. 
To understand why, suppose that observations are linear and available at 
every step. Equation ([2]) becomes 



b n = AX n + QV n , 



(22) 



where A is a real k x rm matrix. The implicit filter takes on a simple 
structure since, from Eq. ([5]), we get 



with 



F j( x ) = \( x ~ / i i) Ts i \ x ~ Mi) + 4>j, 
EJ 1 = (G(X;,t n )G(X",t n ) T )" 1 + ^ T (QQ T )-^, 



(23) 
(24) 
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H = Z j ((G(X^t n )G(Xj,t n ) T r 1 R(X?,t n )+A T (QQ T r 1 b n+1 ), (25) 
Kj = AG(X?,t n )G(XJ,t n ) T A T + QQ T , (26) 

0. = i(5" +1 - AE^f 1 )) 1 !^ 1 ^ 1 - AR(Xf,t n )), (27) 
With Lj = /, we substitute the random map (fT3|) into ([6]) to find 

and the Jacobian 

J«exp(-^)(r ? JST 1 % -)- n/2 . 

Since T,j is symmetric, the values that the random variable J can take on 
are bounded above and below. The Jacobian J can vary dramatically from 
one sample (of £, respectively rf) to another, especially if the largest and 
smallest eigenvalues of E,- are separated by a large gap. As an example, an 
approximation of the pdf of J for n = 2 and X -1 = diag(l, 0.5) is shown in 
figure EJ The pdf has two peaks, at the left and right ends of the interval 
over which J is defined. This interval is determined by the eigenvalues 
of S and, in this example, J can can take on any value in the interval 
[0.5,1]. Choosing Lj = I produces a Jacobian that can vary significantly 
from particle to particle. The goal however is to make the weights as uniform 
as possible. 

If we choose Lj such that Sj = LjLj, we find Xj = ^JpJ and J = det Lj. 
This Jacobian can be expected to be roughly constant as long as the particles 
are reasonably close to each other, as they are expected to be with our filter. 
In the special case of additive noise, i.e. G(x n ) = G = const, in ([1]), and 
linear observations available at each point in time, the Jacobian J = det Lj 
is constant and need not be computed. In fact, the implicit filter with 
random maps is, for this special case and with this choice of Lj, equivalent 
to optimal importance sampling [1,3,32J. The implicit filter is thus optimal 
in this case in the sense that its weights have minimum variance pp. 

In the general case, we have the Hessian evaluated at the minimum, say 
Hj, at our disposal because we use Newton's method to minimize Fj and 
thus have: 

1 T 

Fj(Xj) = Fj(fij) + -(Xj - (lj) 1 Hj(Xj - fij) + higher order terms. (29) 

Choosing Lj so that HJ = Lj Lj is a good choice, especially if Fj is 
quadratic or nearly so. This choice of Lj also suggests a good initializa- 
tion for the numerical computation of the parameter A in the random map 



10 



14 



12 
10 




Figure 2: Probability density function of the Jacobian for n = 2 and a 
given E. 



(fl~3j) . One can expect A to be on the order of yfp and one chooses A° = ^fpj- 
In all the examples below, the minimization in ([7]), as well as the iterative 
computation of A converged after a few steps with this set-up. 

It is also interesting to compare the random map implementation of the 
implicit filter to an implementation outlined in [5]. There, the function 
Fj(Xj) is replaced by its quadratic approximation 

FfiXj) = Fj(fij) + l(Xi - HjfHjiXj - (30) 

Instead of solving Q, one solves Fj(Xj) — cj)j = 0.5£j£j, where $ = minF? 
can be computed by formulas similar to f|24j> — (|2Tjl . The solution of this 
approximate equation can be obtained by a Cholesky decomposition of Hj 
(the Hessian of Fj at the minimum) and the Cholesky decomposition also 
yields the Jacobian J. A reweighting of the particles to account for the fact 
that one solves an approximate equation rather than ([6]) gives the weights 

w] +1 oc w] exptyj ) • exp (Fj (Xj )-Ff(X j ))-J. (31 ) 
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The extra term exp(F i (X J )-F J °(X j )) can produce low weights if the quadratic 
approximation of iy is not close to Fj in the neighborhood of the sample Xj. 
The random map (|13|) eliminates this factor because one solves equation ([6]), 
rather than an approximate equation. We traded an iterative solution of a 
scalar equation for possibly small weights due to a quadratic approximation. 

4 Filtering a stochastic Lorenz attractor 

The stochastically driven Lorenz attractor [27] has been used as a testbed 
for data assimilation algorithms on many occasions [4,28,29j. We follow 
this trail and test the implicit filter on the stochastic Lorenz attractor with 
additive noise 

dx = cr(y - x)dt + gxdWt, (32) 
dy = (x(p - z) - y)dt + g 2 dW 2 , (33) 
dz = (xy - (3z)dt + g 3 dW 3 , (34) 

with the standard parameters a = 10, p = 28, /3 = 8/3, and initial conditions 
s(0) = -5.91652, y(0) = -5.52332, z{0) = 24.5723. The noise is chosen 
equally strong for all variables. Specifically, we choose gi = g 2 = g 3 = g = 

4.1 Discretization of the dynamics 

We discretized the continuous equations by the Klauder- Petersen (KP) scheme 

m 

x n+l,* = x ^ + Sf( x n )+ gAW!, (35) 
x n+1 = x n + ^(f( x n ) + f( x ^*))+gAW 2 , (36) 

where AWi,Aiy2 ~ M(0,5I) and where /(•) can be read off the Lorenz 
attractor (|32p - (|34p . The scheme is second-order accurate for g = 0. With 
g ^ (additive noise), the scheme is of strong order 1, i.e. the mean of the 
error at t = t n is bounded by Const • 5. Figure ([3|) shows the convergence of 
the KP scheme for the stochastic Lorenz attractor after one dimensionless 
time unit. The graph (black line) shows the mean of the error as a function 
of the time step. The error was approximated by the difference between 
the solution with time step 5 and the reference solution with time step 
5 re f = 2~ 12 . The mean of the error norms (not the difference in mean error!) 
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Figure 3: Convergence of Klauder-Petersen scheme and Runge-Kutta 
scheme for the stochastic Lorenz attractor. 



was approximated by running 1000 experiments and averaging. We observed 
the expected first-order decay in the mean of the error. For comparison, we 
also computed the convergence of a fourth-order Runge-Kutta (RK) scheme, 
where we added a Gaussian with variance Sg 2 I after each full step. For 
g = 0, this scheme is fourth-order. For g ^ 0, it is of strong order 1 (because 
no integrals of the BM are evaluated [24]). We ran 1000 experiments with 
g = y/2 to approximate the mean of the error and observed the expected 
first-order stochastic convergence (light-grey line in figure [3]). The stochastic 
orders of convergence for the KP or RK schemes were thus no better than 
that of the simple forward Euler scheme [23]. However, the forward Euler 
discretization could not follow the solution of the SDE for large integration 
times, because of its low accuracy in the deterministic part. The situation 
is illustrated in figure [3J The forward Euler discretization diverged after 
roughly 2 dimensionless time units, while the KP and RK schemes converged 
for the significantly longer integration time of 12 units. We should point out 
that figure [4] shows only one representative model run and that each scheme 
evolved under a different BM. 

In data assimilation applications, one should aim at a scheme that is of 
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Figure 4: Discrete time approximation of the z-variable for a given BM and 
using the Euler scheme (top), Runge-Kutta 4 (middle) and Klauder- Petersen 
scheme (bottom). 
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low order (to make the filter efficient and fast), but at the same time catches 
the dynamics of the underlying SDE (the observation data may be incom- 
patible with an under-resolved discretization). For the Lorenz attractor, 
the high deterministic accuracy of the RK scheme appeared unnecessary. 
The simpler KP scheme yielded a comparable stochastic convergence and 
followed the solutions of the underlying SDE for long enough. We were thus 
content with the KP scheme and a time step 5 = 0.01. 



4.2 Filtering results 

We start by considering a case with observations of all three state variables at 
every time step. The observations were corrupted by noise with variance 0.1, 
i. e. we chose Q = V0.1I in ([2]). We applied the implicit filter as explained 
in section El The function Fj , as given by equation (I12D , was minimized by 
Newton's method, initialized by a model run of one step without noise. 
At each step, we sampled a 6 dimensional standard normal variate (the 



reference variable) and computed the random direction rjj = £/y to be 

used in the random map. As explained in section [3j we chose Lj in (|13[) to 
be a Cholesky factor of the Hessian evaluated at the minimum. Substitution 
of the map (fl~3j) into the algebraic equation ([6]) gave the required equation 
for Aj, which we solved by a Netwon method. The iteration was initialized 
by choosing A!- = ^/pj and typically converged within 4-6 steps. Finally, we 
computed the weight of the particle using (|2ip and the numerical derivative 
dX/dp, with a perturbation A A = lO -5 ^^. We repeated this process for 
each particle and resample with "algorithm 2" in [I] . We decided to resample 
at every time an observation becomes available. 

We compared the implicit filter with an SIR filter [TJ[9l[T3]. To that 
effect, we ran 1000 twin experiments. That is, we ran the model for 1200 
time steps to produce artificial observations corrupted by the assumed noise. 
This model run was the reference we wished to reconstruct using the SIR and 
the implicit filters. For each experiment, the error at time t n is measured 
by 

e n = \\x? ef - x n \\, (37) 

where the norm is Euclidean, x™ e j is the reference state, and x n is the recon- 
struction by a filter. We computed this error after 5, 10 and 12 dimensionless 
time units (i.e. after 500, 1000 and 1200 steps) for both filters. We then 
computed the mean value of the error norms (mean error, for short) and 
the mean of the variance of the error norms (mean variance of the error, for 
short). The mean of the error norm is a better estimate of than the mean 
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error because it does not allow for cancellations. The mean variance of the 
error is not the variance of the mean, it is a fair estimate of the error in each 
individual run. Our results are in table [TJ 

Table 1: Results for observations of the state at every time step. 
# of Particles Mean error / mean variance of the error, implicit filter 





t = 5 


t = 10 


t = 12 


5 


0.4146/0.2624 


0.4369/0.3687 


0.4270/0.3216 


10 


0.3215/0.1351 


0.3289/0.1391 


0.3311/0.1690 


20 


0.2783/0.0979 


0.2822/0.1018 


0.2866/0.0991 


30 


0.2691/0.0914 


0.2728/0.0931 


0.2688/0.0908 




Mean error / 


mean variance of 


the error, SIR filter 




t = 5 


t = 10 


t = 12 


5 


0.7915/1.8066 


1.1751/4.0425 


1.2544/4.3517 


10 


0.4464/0.6503 


0.0511/0.9587 


0.4158/0.4158 


20 


0.3159/0.1783 


0.3196/0.2920 


0.3156/0.1815 


30 


0.2798/0.1016 


0.2838/0.1013 


0.2810/0.0999 


50 


0.2695/0.0910 


0.2688/0.0919 


0.2711/0.0913 



Table [T] shows that the implicit filter produced a small mean error and 
small mean variance of the error with 20-30 particles. We can also see that 
the statistics converged with about 20 particles. With 20 particles the mean 
error variance is of the order of the variance of the observations. Even 10 
particles yielded good results. The SIR filter required about 50 particles 
to yield comparable accuracy. For either filter, we observed a significant 
increase in the mean error and mean variance of the error with increasing 
time if the number of particles is too low (about 5 for implicit filter, about 20 
for SIR filter) . The increase in mean error and mean error variance was due 
to sample impoverishment: as time progresses, the quality of the particle 
ensemble decreased, i. e. more and more particles had low weights. The 
effects of sample impoverishment were less severe for the implicit filter than 
for the SIR filter. 

Table [2] shows error statistics for the SIR and implicit filters when obser- 
vations of the x-variable only are available, i. e. the observations are dense 
in time, but "sparse in space." The observations were corrupted by noise 
with variance 0.1. The results are qualitatively the same as above. The 
implicit filter required about 20 particles, while the SIR filter needed about 
50 particles for comparable accuracy. 
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Table 2: Results for observations of the variable x n at every time step. 



# of Particles Mean error / mean variance of the error, implicit filter 





t = 5 


t = 10 


t = 12 


5 


0.4846/0.2624 


0.4369/0.3687 


0.4270/0.3216 


10 


0.3215/0.1351 


0.3284/0.1391 


0.3311/0.1609 


20 


0.2783/0.0979 


0.2822/0.1018 


0.2806/0.0991 


30 


0.2691/0.0914 


0.2728/0.0931 


0.2688/0.0908 




Mean error / 


mean variance 


of the error, SIR filter 




t = 5 


t = 10 


t = 12 


5 


0.7915/1.8066 


1.1750/4.0425 


1.2544/4.3517 


10 


0.4464/0.6503 


0.5011/0.9587 


0.4311/0.4158 


20 


0.3159/0.1783 


0.3196/0.2920 


0.3156/0.1815 


30 


0.2798/0.1016 


0.2838/0.1013 


0.2810/0.0999 


50 


0.2693/0.0910 


0.2688/0.0919 


0.2711/0.0913 



Finally, we considered the case of observations that are sparse in time. 
Observations of all three state variables, corrupted by noise with variance 
0.1, became available every 0.48 dimensionless time units (every 48 steps). 
This is a hard data assimilation problem and some filters miss transitions 
from one wing of the Lorenz butterfly to the other |28j . 

The larger dimension of this problem required an additional tweak of the 
algorithm. The problem is of dimension 288: 3 dimensions for the Lorenz 
attractor, times 2 for the intermediate step x* of the KP scheme, times 48 
for the gap between observations. If the variance matrix of the reference 
variable £ is the identity matrix /, we are expressing a vector variable Xj of 
small variance as a function of a unit reference variable, and this produces 
very small Jacobians J which can lead to underflow. One solution is to 
rescale £ which, after all, is arbitrary. What we did instead is keep track of 
the logarithms of the weights rather than the weights themselves wherever 
we could; this solved the problem. 

We ran 1000 twin experiments with this algorithm. Table [3] shows the 
error statistics. Results of one of the twin experiments are shown in figure[5j 
The implicit filter yielded good results with 20-30 particles, while an SIR 
filter required about 100 particles for comparable accuracy (our results for 
the SIR filter are in agreement with those reported in j4j[28]). We had 
problems with our minimization algorithm for gaps that exceed 0.5 time 
units. A more sophisticated initialization or a more robust minimization 
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Reference 



x Observations 
- — Reconstruction by implicit titer 



10 12 14 16 18 




Time 



Figure 5: A twin experiment with an implicit filter and 10 particles. 



18 



Table 3: Results for sparse observations of the state. 



# of Particles 


Mean error / mean 


variance of the error for implicit filter 




t = 4.8 


t = 9.6 


5 


0.1924/0.1750 


0.2192/0.3457 


10 


0.2101/0.4103 


0.2317/0.4905 


20 


0.1676/0.0523 


0.1927/0.1646 




Mean error / mean 


variance of the error for SIR filter 




t = 4.8 


t = 9.6 


10 


0.6508/1.0093 


0.9964/1.9970 


20 


0.4313/0.4663 


0.5352/0.7661 


50 


0.3368/0.2594 


0.4271/0.5445 


100 


0.2156/0.0929 


0.2336/0.1229 



can provide a cure. A detailed discussion of these issues will be taken up in 
future papers. 

4.3 Discussion 

The SIR and implicit filters both worked well on the stochastic Lorenz at- 
tractor and, with a sufficient number of particles, reconstructed the reference 
solution reliably. In all cases we considered, we observed that the implicit fil- 
ter required fewer particles than the SIR filter to give a comparable accuracy. 
We observed that the "focussing" of the particles towards the observations 
was most beneficial when the gap between observations is large. This is 
indicated by the larger number of particles required in SIR than in the im- 
plicit filter. The reason is that the unguided SIR particles are very likely 
to become unlikely when the gap between observations is large, so that the 
SIR importance density and the target density can become nearly mutually 
singular [21[5j[38]. The particles of the implicit filter on the other hand are 
guided towards the observations because they are generated by solving ([6]), 
which incorporates information from the available data. 

The computational cost of these filters is comparable for the examples of 
the stochastic Lorenz attractor. The implicit filter requires fewer particles, 
but the computations for each particle are more expensive when compared 
to the SIR filter. The random map solution of the algebraic equation © is 
efficient and reliable. 
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5 Filtering a stochastic Kuramoto-Sivashinsky equa- 
tion 



The Kuramoto-Sivashinksy equation [25, 37J is a chaotic partial differen- 
tial equation that models laminar flames or reaction-diffusion systems (see 
[Hl[15]). Recently, stochastic Kuramoto-Sivashinsky (SKS) equations have 
also been used as a large dimensional test-problem for data assimilation al- 
gorithms [Hi 18] ■ We follow in these footsteps and test the implicit filter with 
random maps on the SKS equation 

Ut + uu x + u xx + vu xxxx = g W(x, t) (38) 

where v > is the viscosity, g is a scalar and W(x,t) is a stochastic process. 
We restrict ourselves to the strip x 6 [0, L], t > and consider the case of L- 
periodic boundary conditions. Expanding the solution into a Fourier series 
u(x,t) = Yl Uk(t) exp(icokx) transforms (f3"5j) into an infinite dimensional 
stochastic ordinary differential equation of the form 

dU = {A{U) + F{U))dt + g dW t , (39) 

where A is a diagonal linear operator and where J- is a nonlinear operator. 
We assume that the noise process dWt is a cylindrical Brownian motion [20] , 
i.e., there exists a sequence q n > 0, n > 1, of positive real numbers, and a 
real number 7 £ (0, 1) such that 

00 

X>^-V<oo, (4Q) 
n=l 

where X n are the eigenvalues of A in (|39|) . Let /?" be independent BM's. 
The cylindrical BM dWt is given by the infinite series 

00 

dWt = J2V^e n ff, (41) 

n=l 

where the e n are unit vectors. The coefficients q n control the continuity of 
the noise process dWt in space. For example, q n = 1 for all n corresponds 
to space-time white noise, and exponentially decaying q^s make the noise 
continuous in space while it remains white in time. We will consider two 
noise processes to drive the KS equation, namely space-time white noise 
and, following [H [71 [26JE6], spatially smooth noise with q n — sxp( — |w n |). 
In figure (U a realization of space-time white noise is shown in comparison 
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with a realization of a spatially smooth noise process. The absence of any 
correlation in space or time is evident from the left panel of figure [6l The 
right panel illustrates the continuity of the noise in space at a fixed time. 

A projection of equation (j39|) onto an m-dimensional subspace spanned 
by m Fourier modes U).(t) yields an m-dimensional Ito-Galerkin approxima- 
tion of the SKS equation 

dU =(C(U) +Af(U))dt + g dW t m , (42) 

where U is a finite dimensional column vector whose components are the 
Fourier coefficients of the solution and where dW™ is a truncated cylindrical 
BM, obtained by projection of the cylindrical BM dWt into the Fourier 
modes. Assuming that the initial conditions u(x,0) are odd with Uq(Q) = 
and that dW™ is imaginary, all Fourier coefficients Uf~{t) are imaginary for 
all t > 0. Writing Up, = iUk and subsequently dropping the hat gives 

C{U) = diag(^ - uui)U, (43) 

m 

{M{U)} k = -^ Yl U k'U k -k>, (44) 

k'=— m 

where ijJk = 2irk/L, k = 1, . . . ,m and {Af(U)} k denotes the k th element of 
the vector ftf(U). We choose a period L = 16n and a viscosity v = 0.251, 
to obtain SKS equations with 31 linearly unstable modes. This set-up is 
similar to the SKS equation considered in [18] . With our parameter values 
there is no steady state as in [3]. We chose zero initial conditions U(0) = 0, 
so that the solution evolves solely due to the effects of the noise. 

5.1 Numerical integration and convergence 

The numerical integration of (142)) is more delicate than for the Lorenz at- 
tractor. The forward Euler scheme is unstable for any reasonable time step 
so that one must consider more sophisticated schemes to discretize (|42f) . see 
e. g. [20,24,26,30,31]. We found that fully implicit schemes, for example im- 
plicit Euler or an implicit 1.5-strong-order scheme, are numerically awkward 
for the SKS equation (and, in fact, for most high dimensional problems). The 
exponential Euler scheme [20] can be thought of as a stochastic version of 
exponential time differencing [HJ and is tailor-made for nonlinear equations 
whose stiffness arises from their linear parts. While the scheme is only first 
order in time, competing schemes, for example the linear-implicit Euler or 
the Lord-Rougemont [26j schemes, converge even slower. Similar observa- 
tions were made in [20]. Taking into account both the time discretization 
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and space truncation error, the exponential Euler scheme appeared superior 
to other schemes we considered. For the SKS equation, this scheme takes 
the form 

U n+l = e BS U n +B-\e B5 -I)M(U n )+g^/(l5DB I ^^~^I)AW n+1 (45) 

where B = diag(u;^ — vujfy and D = diag((?fc). Note that all the matrices are 
diagonal, so that the numerical integration can be implemented efficiently, 
even if m is large. In the notations of Eq. (|3|), we write 

R{U n ,t n ) = =e B6 U n + B~\e BS - I)M(U n ), (46) 
G{U n ) = g^.f>QB- l {e 2m - 1). (47) 

To assess the convergence of the exponential Euler scheme we calculated 
a very accurate reference solution with a time step of 2 -12 and compared it to 
approximations with varying time steps. The number of Fourier modes was 
held fixed: 512 in the case of spatially smooth noise and 1024 in the case of 
space-time white noise. The mean error was approximated as the average of 
||ti(x,T) — u re j(x, T)\ | over 2000 experiments, where || • || was the Euclidean 
norm, T = 3, g = 4, and u re f was the reference solution. Figure ([7]) shows 
the results. For spatially smooth noise we observed a convergence rate of 
about one, as expected. The scheme converged slower when we made the 
noise white in space, i. e. increased the noise in high frequency modes. The 
exponential Euler scheme converged when the noise is white in space and 
time because the elements of the diagonal matrix multiplying the BM in 
(UJ became smaller as the number of Fourier modes increases (see equation 
(f47|) ) . Figure M shows the results of one of our experiments and indicates 
that the discretization follows the solution of the SPDE long enough for our 
purposes. 

We were content with a time step 5 = 2~ 10 and m = 128 modes for 
spatially smooth noise, and 5 = 2~ 10 , m = 512 for space-time white noise. 

5.2 The observations 

We are solving the SKS equations in Fourier variables, but we choose to ob- 
serve in physical space (as is maybe physically reasonable). The solution of 
the algebraic equation ([6]) is easiest when the function F is nearly diagonal, 
i.e., when its linearizations around a current state are nearly diagonal matri- 
ces; this requires in particular that the variables that are observed coincide 
with the variables that are evolved by the SDE. Observing in physical space 
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Spatially smooth noise 



Space-time white noise 




Figure 7: Convergence of the exponential Euler scheme. Left: spatially 
smooth noise. Right: space-time white noise. Solid line: mean error at 
T = 20. Broken line: order line 1.0. 
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Figure 8: Discrete approximation of u(x = 0.3, t) for different noise processes 
and different time steps. For each case, the numerical approximations share 
the same realization of the noise. 
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while computing in Fourier space creates the opposite situation, in which 
each observation is related to the variables one computes by a dense matrix. 
Our solution of this problem demonstrates the effectiveness of the random 
map algorithm. 

Specifically, we collected observations h(u{x, t)), corrupted by noise with 
unit variance, at the discrete locations x\,... ,x m / 2 - Equation ([2]) becomes 

b n = h((u( Xl ,t n ), . . . , u(x m/2 , t)f = h(-2lm(EU n )) + V n , (48) 
where V n ~ A/"(0, /) and where E is a (2n + 1) x m matrix with rows 

For simplicity, we chose to collect the data at m/2 equidistant locations. 
5.3 Numerical results 

To test the implicit filter we ran twin experiments as in section 14.21 The 
error at time t n is defined as 

e n = \\U? ef -U%\\ (49) 

where the norm is the Euclidean norm || x | | = V x T x; U™ e j denotes the set of 
Fourier coefficients of the reference run and Up denotes the reconstruction 
by the filter, both at the fixed time t n . Tabled] shows the results of 500 twin 
experiments for n = 100, g = 4 and with linear observations /i(x") = x™ 
at every step. The results are graphically summarized in figure [HI Since 
G in ([3]) and Q in (J2]) are independent of the state or time, the Cholesky 
factorization could be done off-line, i. e. needed to be computed only once. 
It follows that det L in (|21|) needed not to be computed, since it is the same 
for all particles. 

From table H] and figure [9] we observe that the implicit filter gave very 
accurate results with only ten particles. The error statistics had converged, 
so that it was unnecessary to perform experiments with more than 300 parti- 
cles. The SIR filter collapsed (all weights were zero up to machine precision) 
unless the number of particles was greater than or equal to 50. Experiments 
with 50, 100, 500 and 1000 particles showed that SIR filter could not yield 
an accuracy close to that of the implicit filter with 10 particles. Even with 
1000 particles the SIR filter yielded four times the mean variance of the 
implicit filter with ten particles. 

One can check that with our parameter choices the ratio of model- 
to-observation noise is larger for space-time white noise than for spatially 
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Table 4: Results for linear observations at every step. 



Spatially smooth noise 

# of Particles Implicit filter SIR filter 

10 0.462345/0.217435 -/- 

20 0.455133/0.210594 -/- 

50 0.434861/0.192192 1.47129/2.23284 

100 0.420063/0.179344 1.35330/1.88725 

200 0.41221/0.1725600 -/- 

300 0.40919/0.1700570 -/- 

500 -/- 1.20573/1.498450 

1000 -/- 0.98354/0.995908 

Space-time white noise 

# of Particles Implicit filter SIR filter 

10 0.505932/0.258586 -/- 

20 0.491701/0.244227 -/- 

50 0.473583/0.225954 2.24747/5.11504 

100 0.460124/0.212956 2.05649/4.28421 

200 0.455131/0.208389 -/- 

300 0.452730/0.205857 -/- 

500 -/- 1.68514/2.87233 

1000 -/- 1.57808/2.51565 
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Spatially smooth noise 




500 1000 
Number of particles 



Space-time white noise 



SIR filter 
Implicit filter 



l ::. 1 1 



500 1000 
Number of particles 



Figure 9: Illustration of filtering results for a linear observation operator: 
the error statistics are shown as a function of the number of particles for 
SIR (black) and implicit filter (light grey). 
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smooth noise. The higher level of noise creates more of a problem for the 
SIR filter than for the implicit filter. We observed that the error of the SIR 
filter increased when the SKS equation was driven by white noise, while 
the implicit filter appeared insensitive to the nature of the model noise. 
In our experience, the implicit filter performs well with a large model-to- 
observation noise ratio. 

Next, we consider the nonlinear observation operator h(x) = x + x 3 . As 
in section 14. 2\ the minimization of Fj was done using a model run without 
noise as the initial guess, followed by a few full Newton steps. Results of 
500 twin experiments are shown in table [5] and figure [TUJ We observe from 



Table 5: Results for nonlinear observations at every step. 





Spatially smooth noise 


# of Particles 


Implicit filter 


SIR filter 


10 


0.197085/0.0401874 


7- 


20 


0.192486/0.0383204 


-A 


50 


0.182398/0.0343374 


0.408985/0.175277 


100 


0.178808/0.033115 


0.377034/0.148200 


500 


-/- 


0.332515/0.114040 


5000 


7- 


0.280989/0.082068 




Space-time 


white noise 


# of Particles 


Implicit filter 


SIR filter 


10 


0.133155/0.0181577 


-A 


20 


0.132795/0.0180349 


-A 


50 


V- 


1.54282/2.41919 


100 


V- 


2.05649/4.28421 


500 


V- 


1.52291/2.36136 


5000 


V- 


1.52078/2.35841 



table [5] and figure [10] that the implicit filter outperformed the SIR filter. 
A SIR filter with 5000 particles gave less accurate results than the implicit 
filter with ten particles for either noise process. The results are similar to 
those obtained for a linear observation operator. 

Last, we consider the case of linear observations at every other time step. 
We ran 500 twin experiments. In each experiment we integrated the SKS 
equation driven by smooth noise with g = 1 until t n = 1005. We averaged 
the results to estimate the error statistics. The results are shown in table [6] 
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Spatially smooth noise 



0.6 r- 
0.55 

0.5 - 
0.45 - 

0.4 - 
0.35 - 

0.3 - 
0.25 - 

0.2 
0.15 

0.1 



SIR filter 
Implicit filter 



200 400 
Number of particles 



Space-time white noise 




500 1000 

Number of particles 



Figure 10: Illustration of filtering results for a nonlinear observation opera- 
tor: the error statistics are shown as a function of the number of particles 
for SIR (black) and implicit filter (light grey). 
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and figure [IT] shows of one of the twin experiments. 



Table 6: Results for sparse observations. 



Spatially smooth noise 



# of Particles 



Implicit filter SIR 



10 

20 
500 
1000 



0.391023/0.156071 -/- 
0.384932/0.151217 -/- 

-/- 0.280533/0.080205 

-/- 0.271989/0.075466 



From table El we observe that the implicit filter appeared insensitive to 
the fact that observations were not always available. The error statistics had 
converged. The SIR filter required at least 500 particles to achieve a similar 
accuracy and often collapsed with fewer particles so that a reliable estimation 
of the error statistics was infeasible. The error decreased compared to table H] 
because we decreased the noise in the model by setting g = 1, rather than 



Finally we want to comment on how our results compare to those re- 
ported in [4lll8j. In [4], Chorin and Krause considered a SKS equation with 
two linearly unstable modes and successfully applied a dimensional reduc- 
tion to their SIR filter. In the present paper we chose a viscosity and period 
that yield 31 unstable modes. Thus, the SKS equation in [4] was a lot 
"nicer" and assimilating data was easier. Jardak et al. [18] considered data 
assimilation for a SKS equation with 32 linearly unstable modes, however 
their noise is milder and the numerical integration is carried out differently. 
They compared the performance of an ensemble Kalman filter (EnKF) to 
that of an SIR filter and Maximum Likelihood Ensemble Filter methods 
(MLEF). Only sparse observations of the Fourier coefficients were consid- 
ered. The conclusion was that EnKF outperforms SIR and MLEF for linear 
observations but has major drawbacks for nonlinear observation operators. 
For nonlinear observations, SIR gave the best results. When compared to 
ours, the SIR particle ensembles in |18| were smaller because of the lower 
noise levels and because the Fourier coefficients rather than the physical 
solution were observed. Nonetheless, the number of SIR particles is 70-250 
and thus larger than the 10-50 particles we require for the implicit filter. 
A more detailed comparison of our results to those in [18] is not possible 
because of the different assumptions. 
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= 4. 
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Figure 11: Outcome of a twin experiment with data sparse in space and time. 
Solid black line (almost hidden): reference. Dashed gray line: implicit filter 
with 10 particles. Dashed black line: SIR filter with 1000 particles. 
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6 Conclusions 



Implicit filtering is a sequential Monte Carlo technique for nonlinear, non- 
Gaussian data assimilation. The implicit filter is designed to keep the num- 
ber of particles required manageable by focussing the particles towards the 
high-probability regions. We have presented a new implementation of an im- 
plicit particle filter in which the underdetermined algebraic equation charac- 
teristic of implicit sampling is solved efficiently via a random map. The use 
of the random map reduces a data assimilation problem of arbitrary dimen- 
sion to a sequence of minimizations of explicitly known functions, followed 
by solutions of algebraic equations. 

We applied the filter in our new implementation to two challenging test 
problems where it performed well in comparison with a standard filter. As 
expected, our filter became more economical, compared with alternatives, 
when the dimension of the problem increased or the model noise grew. The 
various numerical issues that arise as the problem size increases even further 
will be discussed in the context of specific applications. 
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